A Comparison of Features for Automatic Readability Assessment

نویسندگان

  • Lijun Feng
  • Martin Jansche
  • Matt Huenerfauth
  • Noémie Elhadad
چکیده

Several sets of explanatory variables – including shallow, language modeling, POS, syntactic, and discourse features – are compared and evaluated in terms of their impact on predicting the grade level of reading material for primary school students. We find that features based on in-domain language models have the highest predictive power. Entity-density (a discourse feature) and POS-features, in particular nouns, are individually very useful but highly correlated. Average sentence length (a shallow feature) is more useful – and less expensive to compute – than individual syntactic features. A judicious combination of features examined here results in a significant improvement over the state of the art.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Construction of Large Readability Corpora

This work presents a framework for the automatic construction of large Web corpora classified by readability level. We compare different Machine Learning classifiers for the task of readability assessment focusing on Portuguese and English texts, analysing the impact of variables like the feature inventory used in the resulting corpus. In a comparison between shallow and deeper features, the fo...

متن کامل

Cognitively Motivated Features for Readability Assessment

We investigate linguistic features that correlate with the readability of texts for adults with intellectual disabilities (ID). Based on a corpus of texts (including some experimentally measured for comprehension by adults with ID), we analyze the significance of novel discourselevel features related to the cognitive factors underlying our users’ literacy challenges. We develop and evaluate a t...

متن کامل

Qualitative and Quantitative Examination of Text Type Readabilities: A Comparative Analysis

This study compared 2 main approaches to readability assessment. Thequantitative approach applied idea density based on part of speech tagging andcompared 3 sets of text types (i.e., narrative, expository, and argumentative) withrespect to their ease of reading. The qualitative approach was done throughdeveloping questionnaires measuring intermediate EFL learners’ perceptions oncontent, motivat...

متن کامل

Automatic Readability Evaluation Using a Neural Network 2009 - 2010

Many formulas and methods for assessing the readability of a text or determining the appropriate grade level are inaccurate and based only on surface features of text. In automatic assessment of a text’s reading level, computers can easily run more sophisticated models than simple algebraic formulas; the goal of this project is to create such a model. Indexes and statistics will be computed wit...

متن کامل

Comparing human versus automatic feature extraction for fine-grained elementary readability assessment

Early primary children’s literature poses some interesting challenges for automated readability assessment: for example, teachers often use fine-grained reading leveling systems for determining appropriate books for children to read (many current systems approach readability assessment at a coarser whole grade level). In previous work (Ma et al., 2012), we suggested that the fine-grained assess...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010